More-than-multiplicative gene-gene or gene-environment interaction*- Collinearity, why?

Rafael Nepomuceno

Join Date: Jan 2018

Posts: 4
#1

More-than-multiplicative gene-gene or gene-environment interaction*- Collinearity, why?

16 Jan 2018, 10:03

I'm Rafael Nepomuceno, I'm PhD student in Brazil and I'm writing a paper about polymorphism and periodontal disease.

I was trying to do a more-than-multiplicative gene-gene or gene-environment interaction analysis using logistic regression analysis (i.e. (Case/Control * Covariates + Smoking + SNP1 + Smoking×SNP2).

I was trying to do a interaction analyzes separating according to the subgroups (i.e. genotype 1.1. non smoking, genotype 1.2 or 2.2 non smoking, genotype 1.1 smoking...) showing the OR and p-value for each one of the 4 possibilities (2 genotypes * 2 smoking status).

I am using STATA to do that, and I used this code:

logistic group SNP1 smoking ib(0).SNP1#ib(0).smoking age sex

However for all my analysis I saw that:

note: 1.snp11#0.smoking_m omitted because of collinearity
note: 1.snp11#1.smoking_m omitted because of collinearity

Logistic regression Number of obs = 682
LR chi2(7) = 65.23
Prob > chi2 = 0.0000
Log likelihood = -440.10673 Pseudo R2 = 0.0690

---------------------------------------------------------------------------------
group | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
snp11 | 1.434213 .26643 1.94 0.052 .9965244 2.06414
smoking_m | 1.542746 1.07332 0.62 0.533 .3945443 6.032439
|
snp11#smoking_m |
0 1 | 1.204203 .882984 0.25 0.800 .2861243 5.068095
1 0 | 1 (omitted)
1 1 | 1 (omitted)
|
age | 1.049977 .0085603 5.98 0.000 1.033332 1.06689
sex | .682767 .1162793 -2.24 0.025 .4889989 .9533167
_cons | .0960629 .0377742 -5.96 0.000 .0444471 .2076197

---------------------------------------------------------------------------------

Do you know why I can not get all the OR and p-value for all 4 possibilites?
When I tried to do the same analysis with just the interaction term (SNP # smoking) without each variable separately (i.e. ogistic group ib(0).SNP1#ib(0).smoking age sex), I could get the OR and p-value for each of the 4 subgroups (ie genotype 1.1 non smoking, genotype 1.2 or 2.2 non smoking, genotype 1.1 smoking ...), but i think that it is not correct acording to interaction analysis.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30087
#2

16 Jan 2018, 10:28

Well, these results suggest either that not all of the four combinations of SNP11 and smoking are instantiated in the data, or, if they are, that some of them are linearly predictable from sex (or, less likely, age). Try running:

Code:

table snp11 smoking sex

and see if there are cells with nobody in them.

If that doesn't solve your problem, I think you will need to post an example of your data for further advice. Do read FAQ #12 before doing that, so that you will understand how to do that properly with the -dataex- command.

I could get the OR and p-value for each of the 4 subgroups (ie genotype 1.1 non smoking, genotype 1.2 or 2.2 non smoking, genotype 1.1 smoking ...), but i think that it is not correct acording to interaction analysis.

You are correct, that would be an improper model.
Comment
Rafael Nepomuceno

Join Date: Jan 2018

Posts: 4
#3

16 Jan 2018, 10:55

. table snp11 smoking_m sex

sex and smoking_m
---- 0 --- ---- 1 ---
snp11 0 1 0 1

0 87 55 180 57
1 53 36 127 37
2 12 9 24 5

This is the result

Last edited by Rafael Nepomuceno; 16 Jan 2018, 10:57.
Comment
Rafael Nepomuceno

Join Date: Jan 2018

Posts: 4
#4

16 Jan 2018, 10:59

. table snp11 smoking_m sex

sex and smoking

---- 0 --- ---- 1 ---

snp11 0 1 0 1

0 87 55 180 57

1 53 36 127 37

2 12 9 24 5
Comment
Rafael Nepomuceno

Join Date: Jan 2018

Posts: 4
#5

16 Jan 2018, 11:08

This is the result:

Attached Files
Comment
Matt Warkentin

Join Date: May 2016

Posts: 104
#6

16 Jan 2018, 11:16

You could perhaps try the following code. Treat the SNP as ordered 2 > 1 > 0 alleles for SNP1. With a fixed per-allele OR:

Code:

logistic group c.SNP1#i.smoking age i.sex , base

This should give you the per-allele odds ratio within each smoking group.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30087
#7

16 Jan 2018, 15:08

Re #4. Well, there aren't any zeroes. But there are some small cells. Perhaps those become zeroes in the estimation sample (where anybody missing age would be excluded.) If it's not that, I really would need to see the data to troubleshoot.
Comment

.	table	snp11	smoking_m	sex


		sex	and	smoking
	----	0	---	----	1	---
	snp11	0	1	0	1

	0	87	55	180	57
	1	53	36	127	37
	2	12	9	24	5

Announcement

More-than-multiplicative gene-gene or gene-environment interaction*- Collinearity, why?

Comment

Comment

Comment

Comment

Comment

Comment